LIMIT THEOREMS FOR VON MISES STATISTICS OF 
A MEASURE PRESERVING TRANSFORMATION 



MANFRED DENKER AND MIKHAIL GORDIN 

Abstract. For a measure preserving transformation T of a proba- 
bility space (X, /j) we investigate almost sure and distributional 
convergence of random variables of the form 

x ^^r L E f(T^x,...,T^x), n=l,2,..., 

where / (called the kernel) is a function from X d to R and C\, C2, ■ ■ ■ 
are appropriate normalizing constants. We observe that the above 
random variables are well defined and belong to L r (/j,) provided 
that the kernel is chosen from the projective tensor product 

L p (Xi, W • • • <8> w L v {X d ,T dl [i d ) c L p (n d ) 

with p — dr, re [1, 00). We establish a form of the individual er- 
godic theorem for such sequences. Next, we give a martingale 
approximation argument to derive a central limit theorem in the 
non-degenerate case (in the sense of the classical Hoeffding's de- 
composition). Furthermore, for d = 2 and a wide class of canonical 
kernels / we also show that the convergence holds in distribution 
towards a quadratic form X)m=i ^"i^m m independent standard 
Gaussian variables 771 , 772 , • • • • Our results on the distributional 
convergence use a T— invariant filtration as a prerequisite and are 
derived from uni- and multivariate martingale approximations. 



1. Introduction 

This paper aims to extend the theory of von Mises statistics for 
independent, identically distributed random variables to the case of 
stationary processes, beyond the case of weakly dependent random 
variables. Below, we begin with a sketchy and informal account of the 



Date: August 27, 2011. 

1991 Mathematics Subject Classification. 60F05, 37A30, 60B10, 60A10, 60F15, 
28D05, 28A33, 62E20, 62G05. 

Key words and phrases. Von Mises statistic, measure preserving transformation, 
projective tensor product, ergodic theorem, Hoeffding's decomposition, central limit 
theorem, martingale approximation. 

1 



2 



MANFRED DENKER AND MIKHAIL GORDIN 



main points of the paper with a special emphasis to possible novel- 
ties. Then we pass to a discussion of the historical background, the 
motivation of our approach and the content of the sections. 

1.1. A sketch of the topic, the approach and the results in the 
paper. Let T be a measure preserving transformation of a probability 
space (X, J 7 , fi). For d > 1 and some function / : X d — >■ R, called the 
kernel, we investigate, under appropriate normalization, the asymptotic 
behavior of random variables 

(1) x^ f(T h x,...,T«x),n = l,2,..., 

0<il<n,...,0<i c i<n 

as n tends to oo. 

Every function of the type of (jTJ), normalized by some constant or 
not, will be called a von Mises statistic (or simply a V-statistic) for 
the transformation T and the kernel /. Notice that the same class of 
statistics is determined by symmetric kernels, so we will assume that 
/ is symmetric whenever it is needed. 

In general a measurable function / : X d — > K does not have a well- 
defined restriction to a subset of measure zero. Such restrictions are 
needed to define The first objective is to determine a sufficiently 
large class of measurable functions / on X d for which (0Q) is mean- 
ingful as a measurable function on X (in fact, a class of equivalent 
measurable functions as usual in measure theory). We do not assume 
any additional structures (topology, distance, differential structure) to 
determine "nice" kernels. Instead, we use purely measure theoretical 
and related functional analytic concepts. Our solution to this problem 
is formulated in terms of the projective tensor products of the spaces 
L p (n). This is the first novel viewpoint in the present paper. As a first 
application we prove a new version of the individual ergodic theorem 
for sums of type (jTJ) for functions described by the projective tensor 
product. 

Our next results constitute the heart of the paper. They are based 
on the martingale approximation technique with respect to a filtration 
compatible with the transformation. Although some reformulation and 
generalizations are possible (see subsection 1.4 of this Introduction), 
here we only consider the case when a decreasing filtration is defined 
by 

T n = T~ n F, n>0 
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(therefore, only non-invertible T are of interest). We also assume 
for simplicity that T is ergodic. Thus we consider stationary (re- 
versed) martingale differences adapted to this filtration and, more- 
over, their multiparameter generalizations. These generalizations turn 
out to be some stationary random fields of canonical kernels (see Sec- 
tion H] for the definition) enjoying a certain form of the multiparam- 
eter reversed martingale property. A formalism of multiparameter 
martingale-coboundary decompositions recently developed in [30], is 
applied to the tensor product spaces yielding under appropriate as- 
sumptions upper bounds for norms of the sums in (CQ). Along with 
classical results on stationary martingale differences, this leads to the 
Central Limit Theorem (CLT) for the kernels known as non-degenerate 
ones. A more precise terminology would be the CLT with a non- 
degenerate normalization: we do not guarantee the non- degeneracy of 
the limit; instead, we give an expression for the limiting variance (which 
can vanish). This parallels the CLT for Markov processes [29j EH] based 
on the solution of the Poisson equation (which, in fact, appears in the 
present paper in a multidimensional form and plays a similar role). 
This form of the CLT for V-statistics of a measure preserving transfor- 
mation is another new result in the paper. 

Finally, the last new result asserts that under some conditions on 
a degree two canonical kernel / the limiting distribution of properly 
normalized sums ([[]) is identical to that of a finite or infinite rank 
diagonal finite trace quadratic form in independent standard Gaussian 
variables; the coefficients of the form are the eigenvalues of a (unique) 
martingale type canonical kernel which emerges as the main term in the 
martingale-coboundary decomposition (mentioned above) of /. This 
description of the limiting distribution is exactly the same as in the 

1.1. d. case and does not involve correlated Gaussian variables as in a 
number of papers on V-statistics of dependent variablesEI 

Notice that no theorems on multiparameter martingale differences 
are needed in the present paper, although such random arrays emerge 
in the course of the proofs. All used results reduce to classical Doob 
inequalities and the Billingsley-Ibragimov Central Limit Theorem, both 
treating the one-parameter case. 

1.2. V- and U-statistics of i.i.d. variables and related limit 
theory. Let £ = (£ n )n>o be a sequence of independent identically dis- 
tributed (i.i.d.) random variables defined on some probability space 

^-Wheri the present paper had been completed, to authors' attention was brought 
the paper [35] where a particular case of our result is established (see the discussion 
at the end of subsection 1.2 for more details). 
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and taking values in a set Y endowed with a cr-field Q . A function 
F : Y d — > R, called the kernel, determines averages of the form 

(2) ^ E n^-M d ). 

0<«i <n,..., 0<id<n 

Such averages were introduced by R. von Mises in [15]. Assume that 
F, £ and (Y, Q) ensure that (jSJ) is a well-defined random variable on 
the probability space where £o>£i,--- are defined (this issue will be 
discussed below). Random variables of the form (T5]) are called von 
Mises statistics or V-statistics. We shall use this terminology for the 
sums in ([2]), normalized in some way or not. The original statistical 
motivation of von Mises was as follows. The expression in §2§ can be 
viewed as the integral of F with respect to the cf-th Cartesian power 
of the normalized empirical measure of the sample (£o, • • • , £n) (this 
measure puts the weight k/n to every point having multiplicity k in 
the multiset {£ , • • • >£n-i})- Then (j2J) generalizes to a certain class of 
probability measures v (or even signed measures) on (Y, ^) as a d— fold 
integral yielding the functional 

(3) v i y F{y) = f F( Vl , y d )v{d Vl ) . . . u{dy d ). 

JY d 

Notice that we obtained a polynomial functional of degree d on mea- 
sures on (Y, Q). Under appropriate assumptions such functionals are in 
one-to-one correspondence with symmetric kernels F. Let v denote the 
distribution of any of the £ 0) £i, • • ■ an d let u* be the empirical measure 
of the sample (£ , • • • , £n-i)- 111 f ac t von Mises considered, using the 
tools of infinite-dimensional differential calculus, more general smooth 
nonlinear functionals (possibly, non-polynomial) . For such a functional 
$, von Mises used the statistic $(^) as a statistical estimate for $(u). 
In the papers [16] by von Mises and [21], [22] by Filippova it is shown 
that under certain assumptions the investigation of the asymptotic dis- 
tribution of $(^) — reduces to the study of the same asymptotic 
characteristics of the random variables ([1]). We do not go now into 
details about the definition of the symmetric kernel F and the degree 
d. They are determined by the Taylor expansion of $ along with the 
probability measure v. In fact the passage to F is completely parallel 
to Hoeffding's decomposition and the concept of (non-) degeneracy in 
the theory of U-statistics. Some kind of Hoeffding's decomposition in 
the form we need in this paper will be considered in Section 4. 

We continue discussing the class of symmetric kernels F which can 
be considered for a given v. Notice that an assumption of the type F e 
L p (Y d ,g d ,u d ) does not even guarantee that all summands in ([2]) are 
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well-defined. For example, F(£o, £1) may be well-defined while -F(£ , Co) 
may be not. More precisely, if (Y, Q, v) is not a purely atomic space 
and Fi : Y 2 — > R, i = 1, 2, are two measurable functions which agree 
(v x z/)-almost everywhere, their restrictions to the main diagonal D = 
{{y,y) : y 6 K) may fail to be measurable functions on D agreeing 
almost everywhere (the measurable structure and the measure on D is 
assumed to be induced from (Y, Q,v) by means of the map y i-> (y,y)). 
This leads to additional requirements on a symmetric kernel for results 
on limit distributions of canonical V-statistics of degree 2. In [37], for 
example, the assumption f \F(y, y)\v{dy) < oo is made. This condition 
(which includes the existence of a meaningful restriction of F to the 
main diagonal inYxY and the integrability of this restriction) does not 
have a simple operator-theoretic reformulation. A stronger condition 
we are mostly using in the present paper is equivalent for d — 2 to the 
property that F determines a trace class integral operator in the space 
L 2 {v). " 

Notice that a parallel but somehow different development is due to 
Halmos [32] and Hoeffding [33J. They introduced U-statistics, defined 
for n > d by a formula similar to (J2J) but where the summation is only 
extended over all strictly increasing (i-tuples of indices: 

(4) 77a E ^ CJ. 

\d) 0<i 1 <i 2 <...<i d <n 

In the case of i.i.d. random variables £ , £i, • • ■ the statistics (jlj give un- 
biased estimates of ([3]) while V-statistics are in general merely asymp- 
totically unbiased estimates of (j3j). Another attractive feature of U- 
statistics is that there is no need, unlike in case of V-statistics, to 
consider restrictions of F to some measure zero subsets of (Y d , Q d , u d ) 
as we just did for V-statistics restricting the kernel F to the diagonal 
D. Due to this property of U-statistics in the i. i. d. case the assump- 
tions about the kernel in limit theorems are in general weaker and more 
natural for U-statistics than for V-statisticsQ 

The theory of U- and V^-statistics for i.i.d. variables is well developed 
(see [371 [T^j and references therein). It starts with Hoeffding's decom- 
position of the kernel as a nonlinear function of i.i.d. variables [M], a 
representation analogous to the "Wiener chaos" in case of the Brownian 
motion. This decomposition splits the kernel into a sum of polynomial 
functional of various degrees (called canonical or totally degenerate) 

2 In case of dependent £o,£i,... the advantages (and even the definition) of U- 
statistics are much less obvious. We do not investigate U-statistics in the present 
paper since this requires a substantial change of the approach. 
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each of which is orthogonal to all functionals of lower degrees. The 
so-called "non-degenerate" kernels define U- and ^/-statistics whose 
"additive part" (or the "first order part") is in fact a sum of centered 
i.i.d. variables. This additive part asymptotically dominates and leads 
to the results similar to those which are known for sums of i.i.d. ran- 
dom variables (a delicate problem of bounding the influence of the 
non-additive remainder has to be solved here); in the degenerate case 
there is no "additive part" at all; the canonical kernels of different 
degrees d > 2 are studied separately, the limiting distribution being 
described in terms of quadratic forms in Gaussian variables (for degree 
two) or in terms of stochastic integrals of multiplicity d with respect 
to the Brownian motion or the Brownian bridge; for a kernel which 
is the sum of such parts of different order the non-vanishing term of 
the lowest order makes an asymptotically dominating contribution to 
the corresponding ^—statistic. Thus, the theory includes results of 
the type of the Strong Law of Large Numbers (SLLN), the Central 
Limit Theorem (CLT) for the non-degenerate case and results on as- 
ymptotic distribution of canonical statistics. Also functional versions 
of some results, large deviations and almost sure invariance principles 
are considered in the literature. 

Degenerate von Mises statistics for independent variables has been 
first treated by von Mises in jl6] and Filippova in |25j. Von Mises 
studied special cases, while it is shown in the latter paper that the 
convergence to some multiple integral with respect to the Brownian 
bridge (Wiener chaos) holds (under suitable moment conditions on the 
canonical kernels in the decomposition). Neuhaus jH] proved a func- 
tional form of the weak convergence for degenerate kernels of degree 
2. Although he treated U-statistics only, the method applies as well to 
von Mises statistics with properly modified limit distributions. In [21] 
the functional form of Filippova's result is obtained with the distribu- 
tional limit presented by multiple stochastic integrals with respect to 
the Kiefer-Mtiller process. Many fine results on U-statistics (maximal 
inequalities, large deviations, functional CLT) are included or surveyed 
in P3] and |3Q]. 

1.3. Limit theory for V- and U-statistics of dependent vari- 
ables. For non-independent random variables some progress has been 
made for weakly dependent and associated processes (see [15], [16] and 
references therein). More generally, the SLLN for von Mises statis- 
tics of an arbitrary ergodic stationary processes has been treated in 
[1], where it is shown, among other important results and interesting 
examples, that the averages in (j2J) for an ergodic strictly stationary 
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process £ = (£ n )n>o, taking real values with the one- dimensional distri- 
bution v, converge a.s. to the expression (J3]), the assumptions ranging 
from continuity of the kernel to the weak Bernoulli property of £. One 
of the results in pQ on von Mises statistics is a SLLN provided the 
kernel is bounded by a product of functions in separate variables. In 
case of functionals of mixing processes a form of the SLLN has been 
proven in [9] which is not contained in [I]. In almost all other papers 
the CLT (sometimes together with its functional form) has been con- 
sidered. Yoshihara [UJ was the first to give a probabilistic treatment of 
the CLT question when the process is absolutely regular. Other mixing 
conditions are investigated in |H 13 [HI El El UHl SHI SH SS|- Func- 
tionals of absolutely regular processes have been studied in [19J. See 
[20] for an application of these results to a new method of constructing 
asymptotically distribution free confidence intervals for the correlation 
dimension (see [31]). Later many limit results have been considerably 
improved in [9] and [10] by establishing a functional form of the central 
limit theorem. In the weakly dependent case we mention the works of 
Babbel [51 H] and Amanov [3] where various types of mixing conditions 
are considered, including strong mixing. The above list is incomplete, 
more information is contained in the surveys [UJ and [TB] . 

Most of these results are based on mixing conditions, in particu- 
lar, the absolute regularity is assumed by many authors to employ the 
coupling method. Besides, the theory for the dependent case tries to 
follow the pattern of the i.i.d. case. However, non-degeneracy in terms 
of the classical Hoeffding's expansion does no longer imply the non- 
degeneracy of the limit in the CLT. Also in the degenerate case for 
d = 2, similar to the i.i.d. case, the expansion of the kernel F over the 
eigenfunctions of the related integral operator is used in many papers. 
Such approach, however, in the dependent case does not allow enough 
control over the correlation properties of random sequences in question 
which leads to a rather fuzzy description of the limit distribution as 
that of a quadratic form in correlated Gaussian variables. Notice that 
in a recent paper [38], independently of our research, for a certain class 
of kernels of degree 2 the limit distribution of V-statistics is derived 
which has the same form as in the i.i.d. caseJl This viewpoint agrees 
with ours and the program we develop in this paper (ignoring such 
inessential details in J3S] as the requirement of positive definiteness of 
the kernel and use of topology on X). However, kernels of this class 

3 The limit theorem in |38j only provides a probabilistic basis for statistical ap- 
plications developed in this paper. Let us remark that the ergodic process in [35] 
should be assumed to generate a strictly growing filtration; otherwise, only the zero 
kernel satisfies the conditions of the theorem there. 
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are not generic in the sense that they all belong to a closed subspace of 
infinite codimension in the space of canonical kernels from L S 2 m (v x u), 
while the kernels in Theorem H] of the present paper are dense in the lat- 
ter space. The method of reduction of a generic kernel to a martingale 
type one based on the so-called martingale-coboundary decomposition 
is one of main points of the present paper0 

1.4. The approach and the method of presentation in the pa- 
per. Here we explain why and how we pass from stationary sequences 
to measure preserving transformations, what role is played by tensor 
product spaces in the description of classes of suitable kernels and, fi- 
nally, why and how we use processes generated by non- invert ible trans- 
formations instead of adapted processes in the invertible setup. 

Let £ = (£ n )raez be a strictly stationary random sequence taking 
values in a measurable space (Y,Q). Any such a sequence £ can be 
thought of as defined on a probability space (X, F, P) equipped with 
a measure preserving invertible transformation T so that 

£ n = £ oT n ,nGZ. 

For a measurable symmetric function F : Y d i— > M let us set 

/(zi, ...,x d ) = F(£ (xi), . . . , £o(x d )). 

Then, for n > d, the statistic 

(5) /•K.f.r) OJA) 

0<ii<n-l,...,0<i ( j<n-l 

(known as a V^-statistic of the sample (£i, ...,£„,)) is of the form (pQ). 

Obviously, the mapping ([I]) is a generalization of (loT). Moreover, a 
natural idea is to develop the limit theory for V^-statistics directly in 
terms of the transformation T and the kernel / excluding £ and F. 
This should lead to a better understanding of the properties charac- 
terizing "nice" kernels, while in the present theory these properties are 
distributed among the process £ and the function F in an unclear way. 
As it will be seen it is possible to exclude the process £ completely from 
the proposition on the SLLN. As to the results on the distributional 
convergence, the process £ is substituted by a filtration compatible with 
T. 

4 More precisely, in the setup of the present paper the kernels in [35] are contained 

1 1 21 

among symmetric elements of the space M\ ' 2 ' n in the proof of our Lemma SJ The 
latter kernels are exactly those symmetric kernels which generate, by means of 
the dynamics, stationary fields of reversed martingale differences. For such kernels 
the series reduces to one term, hence we have nothing to check to apply our 
Theorem SJ 
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Here, we follow this approach and start with a measure preserving 
transformation T of a probability space (X, J 7 , fi). The basic problem 
is to identify some class of kernels / for which ([1]) makes sense. We 
first discuss the domain of /. 

When dealing with measure preserving transformations it seems to 
be a natural idea to generate the expressions of the form 

(6) f(T^x,...,T k -x) 

by means of an appropriate dynamics. In doing so we first consider an 
action on X d of d commuting copies T 1; . . . , Td of the transformation T. 
Here and in the sequel Tj : X d — y X d acts as T on the z-th component of 
(xx, . . . , Xd) E X d and does not change other components, i — 1, . . . , d. 
We consider an action of T 1; . . . , on some set Z C X d to produce 
terms of the form 

/(T k x) = /(If • • .2j»x) = f{T*x u . . .,T k *x d ), 

x = (xi, . . . , x d ) e Z, k = (kt, ...k d ) e Z d + , 

from a function / defined on Z. Next, we restrict such an expression 
to the principal diagonal 

D = {(x,...,x) : x G X} C X d 

and obtain the desired term. The minimal requirements which Z must 
satisfy are: 

i) TiZ C Z, i = 1, . . . , d; 

ii) D c Z. 

In this paper we examine only one of the possible routes by confin- 
ing our consideration to the case when Z = X d . More precisely, 
let (X±, /ii), . . . , (Xd, T d , fid) be the copies of the probability space 
(X, J 7 , fi) and (X d ,F d ,iJi d ) be their direct product. Thus, we are look- 
ing for a class of functions / : X d — > R for which expression ([6]) makes 
sense. Observe, than even if / : X d — y R is a bounded measurable 
function on X d , in general we cannot restrict it in a correct way to a 
set of measure zero. In particular, we face this problem in ([6]) where 
the corresponding set is 

|J (T k 'x,...,T kd x) 

xex,(ki,...k d )ezf 

which has measure zero for a non-atomic (X, J 7 , //) and d > 2. The 
reason is exactly of the same type as in the case of V^-statistics for i.i.d. 
sequences. In the dynamical setup and in connection with V-statistics 
a related example is given in pQ, Example 4.6. 
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Notice that we also would like to avoid employing any additional 
structures on X (topological, metrical, differential) and stay within the 
measure theory and related functional analysis. It turned out that sat- 
isfactory classes of functional spaces of kernels with the desired proper- 
ties are given by projective tensor products (see [43J) L p (X 1 ,J r 1 , /ii )(§)„- • • 
<S> 7r Lp(Xd, J^d, A*d) C L p (fi d ) with an appropriate choice of the exponent 
p G [1, oo). In particular, we can consider elements of such a space as 
function on X d and restrict them to some diagonal-type subsets of X d 
in a correct way. Though the class of the kernels we use here is wide 
enough, a search for wider classes with similar properties is desirable 
(probably, simultaneously with the search for a satisfactory definition 
of U-statistics in the dependent case). There is no simple criterion for 
a kernel to belong to a certain projective tensor product space. How- 
ever, for d = 2 a symmetric kernel belongs to L^(X, J 7 , /i)®L 2 (X, J 7 , /x) 
if and only if it is a kernel of a trace class integral operator. Such 
kernels were extensively studied in various settings (some results and 
references can be found in [2S])- Once we have chosen a class of kernels 
to deal with, it is natural to apply this choice and prove some form of 
the SLLN for V-statistics of a probability preserving transformation. 
We included in the paper a result of such kind for completeness. 

The main goal of the present paper is to prove distributional results 
for V-statistics of a probability preserving transformation such as the 
CLT and a theorem about the asymptotic distribution for degenerate 
kernels. Instead of various mixing conditions we prefer using a form 
of martingale approximation. Experience with the CLT for stationary 
processes has shown that the martingale approximation plays an uni- 
fying and simplifying role; a large number of results originally proved 
in terms of mixing conditions can be deduced this way. A well-known 
approach here is to use a form of representation of the original process 
as a sum of a stationary martingale difference (or reversed martingale 
difference) and a so-called coboundary. The latter is a sequence of in- 
crements of another stationary process. By this reason the contribution 
of the coboundary into the sum of consecutive values of the process is 
negligible in view of the normalization of the sums by constants tend- 
ing to infinity. So, the limit behavior of the original sums is the same 
as that of sums of martingale differences. To the latter process we can 
apply the well-known limit theorems of Billingsley and Ibragimov. 

Next, to apply the martingale approximation we need a filtration 
compatible with the dynamics. For example, for an invertible mea- 
sure preserving transformation T' of a probability space (X', J 7 ', //) we 
would like to have a sequence of cr-fields • • • C J r '_ 1 CJJCJjC...J r ' 
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satisfying the compatibility condition T'~ x T' n = G Z. Next, a 

J 7 Q-measurable function /' defines a stationary sequence (/' o T' n ) ne i 
adapted to the filtration (J^) neZ . A J^-measurable function h! G Ia(/i) 
defines a stationary sequence of martingale differences if its conditional 
expectation with respect to vanishes: E{b!\T'_f) = 0. Notice that 
a nonzero function h' can only exist if the filtration {fF'^) n &, is strictly 
increasing. Furthermore, the existence of such a strictly increasing 
filtration compatible with X" is equivalent to the property that the 
entropy ofT' is positive. 

However, mostly for the notational convenience, we choose another 
equivalent setup for our presentation. Namely, let T be a measure 
preserving transformation of a probability space (X, J 7 , jf). The trans- 
formation T is no longer assumed to be invertible; moreover, only non- 
invertible transformations lead to non-trivial statements. We introduce 
a canonical decreasing filtration (J r „)„ e z by setting T n = T~ n T , n > 0, 
so that J- n+ i = T~ x T n , n > 0. A measurable function f on X defines a 
stationary sequence {f °T n ) n > . Stationary reversed martingale differ- 
ences emerge now as sequences of the form (hoT n ) n > , where h G Zq(//) 
is a function satisfying Eiji^i) = 0. 

For an interested reader we briefly explain the correspondence be- 
tween these two descriptions. We assume here and in the rest of the 
paper that our probability spaces are standard (which does not re- 
sult in a loss of generality in the sense of possible joint distributions 
of random variables under consideration). Thus, usual constructions 
of ergodic theory can be performed (we use freely the corresponding 
terminology). Then, starting with a probability preserving transfor- 
mation T acting on (X, J 7 , /f) we can construct its (invertible) natural 
extension [12] T acting on a space (X', J 7 ', //). This construction de- 
livers also a measure preserving measurable mapping tt : X' — > X such 
that 7i o T = T o 7r (putting it differently, tt represents T as a factor- 
transformation of an invertible transformation T). Then set T' = T _1 
and J~q = ix~ x T. Taking into account that T' = Vn>o^" n ^ r o holds for 
a natural extension, we can establish the one-to-one correspondence 
/ o Ti 4 — y f between measurable functions f on X and J r o" measuraD l e 
functions on X'. By means of the latter functions every stationary 
sequences on X' adapted to (J^) ne z can be represented in the form 
(/' o T /n ) ne z- Observe here that the time reversal needs to be handled 
with some care in the respect of limit theorems; however, the distribu- 
tional limits of sums relative to a stationary probability are the same 
for the original and reversed processes. This explains how to pass from 
the non-invertible setup to the adapted invertible one. For the passage 
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in the opposite direction, from adapted sequences generated by an in- 
vertible transformation to a non-invertible setup, we need to use the 
construction of the factor-transformation of T /_1 with respect to the 
a- field T'q. 

Remark 1. One of the simplifying assumptions we will make about T 
is exactness [42J which means that n n >o.7>i = A/", where M is the triv- 
ial sub-cr-field of the probability space. Assuming ergodicity (which is 
implied by exactness) the requirement of exactness can be removed if 
we restrict our consideration to the functions whose conditional expec- 
tation with respect to r\ n >oJ- n vanishes. Similar restriction should be 
also posed on kernels. We prefer to assume that T is exact whenever 
it simplifies the presentation. The analogous property of the filtration 
(J-^) ne z compatible with an invertible map T' is known as regularity 
and consists of the relation n n <oJ-^ = A/ 7 . The assumption of ergodic- 
ity also can be removed if we agree to consider the distributional con- 
vergence to a mixture of centered Gaussian distributions in the CLT 
for non-degenerate case and similar more complicated distributions for 
canonical kernels of degree d > 2. 

Remark 2. Under appropriate assumptions, the non- adapted case also 
can be considered in the framework of invertible dynamics on the basis 
of martingale approximation and coboundaries [27]. We do not give 
exact assertions here. We include in the paper only one example show- 
ing how the proposed machinery works without explicit use of mixing 
conditions. Also by this reason and in view of the intention to present 
our approach in a relatively simple setting, we restrict ourselves in this 
paper to some of the most basic limit theorems. 

1.5. Contents of the paper. Finally, we outline the content of the 
paper. 

Section [2] contains a very brief introduction into the theory of pro- 
jective tensor products of Banach spaces and some elementary results 
on the restriction operator (a kind of a rudimentary Sobolev-type em- 
bedding theorem) which are necessary for the remaining part of the 
paper. We were not able to locate the latter results in the literature, 
at least in the form we need them. The proofs are straightforward. 

Section [3] concerns the Strong Law of Large Numbers (SLLN) for 
V-statistics (JT|) of a probability preserving transformation. The SLLN 
holds under assumptions about the kernel formulated in terms of the 
tensor products of the spaces L p (fi) (p > 1), where /i is an invariant 
probability for the transformation. The proof uses classical maximal 
inequalities for the Birkhoff sums. 
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Section H] gives a short resume of Hoeffding's decomposition. This de- 
composition applies to projective tensor products of the spaces L p (n) 
and is stable with respect to the dynamical operators and their ad- 
joints. It will become clear in further sections that the components 
of Hoeffding's decomposition play a determining role in developing the 
martingale-coboundary decompositions. 

Section [5] presents a brief survey of known properties of exact trans- 
formations. Together with section HI this is preparatory material for 
the study of distributional limits in the rest of the paper. 

In Section some bounds for multiparameter Birkhoff-type sums 
are deduced. The limit theorems in the last two sections are based on 
these estimates. The main tool used here is the formalism of multiple 
martingale-coboundary decompositions from [30] applied to the situ- 
ation of the projective tensor products. Elementary properties of the 
latter products are also heavily used, along with a classical inequality 
for one-dimensional martingales. 

The first distributional limit theorem (Section [7]) treats the case 
of non-degenerate kernels. A CLT for von Mises statistics is derived 
combining the classical Billingsley-Ibragimov theorem, one- dimensional 
martingale-coboundary decomposition and the bounds of the previ- 
ous section based on a multiparameter extension of this decomposi- 
tion. Also another CLT result is included there parallel to known 
one-dimensional theorem where the moment requirements are partially 
substituted by a bound for the first moment of sums. 

Finally, in Section [8] we consider symmetric kernels / in dimension 
d = 2 which are degenerate (or canonical). We derive distributional 
convergence to the sum J2T=i ^«£n> where (£k)k>i are independent stan- 
dard Gaussian variables and are the eigenvalues of an auxiliary ker- 
nel f% considered as an integral operator. The kernel f$ enjoys the 
two-parameter reversed martingale properties and appears as a result 
of applying the martingale-coboundary representation to the original 
kernel /. A simple example illustrates this theorem in the end of Section 
1 

2. Preliminaries 

2.1. Multiple actions. Let T be a measure preserving transformation 
of a probability space (X, J 7 , /i) (which is assumed to be standard, that 
is a Lebesgue space in the sense of Rokhlin [12] ) . For every p G [1 , oo] 
we set L p (fjL) = L P (X, J 7 , /i) and define an isometry V : L p (n) — > L p (/j,) 
by the relation Vf — f o T. For every p G [1, oo) let V* : L p /(/j,) — > 
L p i(n) be the adjoint operator of V : L p (fi) — > L p (fi) where p^ 1 +p'~ 1 = 



14 



MANFRED DENKER AND MIKHAIL GORDIN 



1. Simplifying the notation and terminology, the preadjoint operator 
(acting in Li(X, J 7 , //)) of the operator V : Loo(/-*) — ► L^^i) w iU be 
called the adjoint of V and denoted by V* whenever this does not 
lead to a misunderstanding. Analogous notation and agreements will 
be applied to other measure spaces, their transformations and related 
operators. 

For i = 1, . . . , d let (Xi, J^, ///, 2]) be the i— th copy of (X, J 7 , /i, T) 
and Vi,V* be the corresponding operators. The direct product 
[| 1< j <d (Xj, J^, /ij) will be denoted by (X d , T d , fi d ). We will use this 
shorter notation even when the probability spaces (X±, Ti, fii), . . . , 
(X^ J^d, fJ-d) are n °t necessarily the same whenever this does not cause 
an ambiguity; a more detailed notation will be used when necessary. 
The notation L p (fi d ) should be understood correspondingly. Let Z + = 
{0, 1, . . .}. For every n = (n 1; . . . , rid) £ ^+ we set T n (xi, . . . , %d) = 
(T 1 ni xi, . . . , Tj d Xd)- Define a representation of the semigroup by 
isometries in via 

F/ = /or,/eL p (/). 

We do not assume that the transformation T is invertible. The CLT 
proved below will hold for the class of essentially noninvertible T (known 
as exact transformations). The family of adjoint operators (V n *) n&Ii d 

is also a representation of (by coisometries in this case). Note 
that these two representations do not commute with each other in the 
noninvertible case (otherwise they clearly commute). 

2.2. Tensor products and products of functions. The main objec- 
tive in this subsection is to discuss conditions on kernels under which 
V-statistics are well-defined. We first recall the concept of the pro- 
jective tensor product of Banach spaces. For general definitions and 
results we refer to [33]. 

Let Bi, . . . , Bd be Banach spaces with norms | • 1^ , . . . , | • \b a and let 
B x ®- • -®B d be their algebraic tensor product. Elements of B x ®- ■ -®B d , 
representable in the form fx® • • • ® fd, are called elementary tensors. 
The projective tensor product of two or more Banach spaces denoted by 
Bi ® 1T ■ ■ • Cg>7r B d can be described as a completion of the algebraic tensor 
product with respect to the projective norm. The latter is defined as 
follows. Recall that, by definition, a norm on B\ £g> ■ • • ® B d is said 
to be a cross norm whenever it equals W%=i \fi\si for every elementary 
tensor f\®---®fd- Then the projective norm is the exact upper bound 
of all cross norms on Bi <g> • • • <g) B d . 
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For i = 1, . . . , d let (Xj, T^pi) be probability spaces. For px,...,pd G 
[1, oo] we denote by | • | pl ,... )Pd)7r the norm of the space 

L pi (Xl,J- : 1, Hi) ®,r • • • ®7T L pd (X d , T 1, fid)- 

In particular, if pi = ... = pd = p G [l,oo], the projective tensor 
product is denoted by 

L Pt7r (p d ) := L p (Xi,J- : i, //i) ®,r • • • ®tt L p (X d , T i, Hd) 

with the norm | ■ | Pl d,7r- More detailed notation will used when necessary 
Elements of the space L Pt7T (p d ), at least for 1 < p < oo, can be 
thought of as functions from L p (p d ) with some additional "nice" prop- 
erties. 

Lemma 1. For every p G [1, oo] there exists a canonical linear map 

J d : L p>v (ji d ) -> L p (ji d ) 

of norm one such that every elementary tensor fi ® • • • ® fd maps to 
the function (x u . . . , x d ) h> f x (x x ) ■ ■ ■ f d (x d ). 

Moreover, if p ^ oo, then J d contractively embeds L Pj7T (p d ) into 
L p (p d ) as a dense subspace. 

Proof. For every p G [1, oo] there exists a canonical linear map of norm 
one 

J d : L p ^(p d ) -> L p (p d ) 

constructed in the following way. First, sending every elementary 
tensor fi ® • • • ® f d to the function (xi,...,x d ) H> fi(x x ) ■ ■ ■ fd(x d ), 
we define a d— linear map (of norm one) from L P {X X , JF X , pi) x • • • x 
L p (Xd, Td-, Pd) to L p (X d , T A p d ). Then, by the universality property of 
the projective tensor product (see [43], Theorem 2.9, for d = 2, and 
use induction and associativity for d > 2) this map extends to L p ^(p d ) 
uniquely with norm one. Denote this resulting map by J d as well. Its 
image is dense in L Pj7T (p d ), since so is the image under J d of the al- 
gebraic tensor product. Injectivity of the map Jd is a more delicate 
question. It is tightly related with the approximation property (see 
[43] . Chapter 4) which is shared by all L p -spaces. We restrict ourselves 
to the case 1 < p < oo. Then the conclusion can be easily deduced 
from Proposition 4.6 in [43]. Thus, for 1 < p < oo J d contractively 
embeds L pn (p d ) to L p (p d ) as a dense subspace. □ 

The space L p 7T (p d ) is preserved by the operators (V n , V n *) n& d_ . Let 
04", V™) neZ d denote the restrictions of (V n , V n *) neZ d to L P:7T (p d ). The 
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same operators can be described as projective tensor products. For 
example, for n = . . . , rid) we have 



These relations are consequences of general properties of the projec- 
tive tensor products of Banach spaces and operators ([43J, Chapter 2). 
The operators (V™, V™) neZ d have properties very similar to those of 
(V n , V n *) neZ d ; in particular, they have norm one with respect to the 
projective tensor norm. From now on we shall use (V n , V n *) neZ d_ to 
denote both families of operators. 

Example. For p = 2 and d = 2 the space L 2)7r (/i 2 ) can be identified with 
the space of (kernels of) the trace class operators from L,2{n)* to L 2 (ii). 

2.3. Restriction to the diagonal. For every pi, . . . ,pd G [1, oo] with 
Pi 1 + ■ ■ • +p d l = 1 and for every element / G L pi ([i) (gv • • - ®^ L Pd {p), 
we define a function D^f G £i (/•«), so that the latter function plays 
the role of the restriction of / to the principal diagonal {(xi, . . . , Xj) : 
X\ = ■■■ = Xa)} C X d . This (in a slightly more general form) will 
be achieved in the following proposition. We do not deal with the 
interpretation of / as a function defined on X d except for the case 
1 < p\ = ••• = pd = p < oo, when due to the existence of the 
natural embedding (Lemma [1]) the space L p (fi) ® n ■ ■ ■ <g> n L p (fi) can be 
considered as a subspace of the L p (fi d ). In this particular case the term 
"restriction" can be justified by an approximation procedure described 
in Proposition [2] below. 

Proposition 1. Let pi, . . . ,Pd G [l,oo], r G [1, oo] satisfy 



(1) the map V, sending every d— tuple (/i, . . . , fd) G L Pl ([i) x . . . 
x L Pd (fi) to the function 



is a norm 1 d-linear map from L Pl (fi) x • • • x L Pd (p) to L r (/i); 
(2) there exists a unique linear map (of norm one) 

D d : L Pl (fi) ® n ■ ■ ■ (g^ L Pd (fjL) ->■ L r (p) 

such that for every d— tuple . . . , f d ) G L Pl {ji) x • • • x L Pd (/i) 



(7) 




Then 



fi{x)---f d (x), 



Dd(fi®---®fd)=V(f h ...J d ). 
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Proof. The first assertion is a consequence of the multiple Holder in- 
equality ([22], Exercise 6.11.2) 

|/l ■ • • fd\r < \fl\pi ■ ' ' \fd\p d - 

The second one follows from the universality property of the projec- 
tive tensor products with respect to polylinear maps. For the case 
of bilinear maps see ([IS], Theorem 2.9), and then use induction and 
associativity of the projective tensor product for d > 2. □ 

If px = ■ ■ ■ = p d = p, the space L PtW (/j, d ) = L p (fj) ® w • • • <8> w L p (fi) 
is canonically mapped to L p (n d ) by Lemma [TJ For every finite mea- 
surable partition A = {A 1 . . . , A m } let us denote by Ta the cr— field 
of all possible unions of atoms of A and by E(-\A) the corresponding 
conditional expectation. Let (-4 n )n>i a refining sequence of finite mea- 
surable partitions A n = {^.i,n, • • • , A mntn } such that T is the smallest 
a— field containing all J r A„^ n > 1- Let I a denote the indicator of the 
set A. 

Proposition 2. Let d > 1, p\ — ■ ■ ■ — p d — p G [d, oo) and r = p/d. 
Define the sequence (Dd, n )n>i of operators D d ^ n '■ L p ^(fi d ) —> L r (fi) by 

D d) nf = Y] , A ''\ H / f(xi, x d )ii(dxx) ■ ■ ■ n{dx d ). 

fi MAn) J A* 

Then 

D d ,n -»■ D d 

n— >oo 

in the strong operator topology. 

Proof. First let us verify (again using Holder's inequality) that D dn as 
a map from L P)n (/j, d ) to L r (n) does not increase the norms of elementary 
tensors. From the relation 

D d ,n{h ® ■ ■ ■ ® fd) = D d {E(f x \An) <g> • • • ® E(f d \A n )) 
{h] =E{h\An)---E{f d \A n ) 

it follows that 

\DdMl ® • • • ® fd)\r = WllAn) ■ ■■E{f d \A n )\ r 

< \E(f l \A n )\ p ---\E(fd\A n )\ p <\h\ p ---\f d \ p . 

By the properties of the projective norm, this implies that the norm 
of every D dn as an operator from L p ^(/i d ) to L r (fi) is also bounded by 
one. 



18 



MANFRED DENKER AND MIKHAIL GORDIN 



Now, using (IH]), standard properties of conditional expectations and 
Holder's inequality, we obtain 

\D din (fl ® • • • ® fd) ~ fl ■ ■ ■ fd\r = Wl\An) ■ • • E(f d \A n ) - j\ ■ ■ ■ f d \, 

<\E(f 1 \A n )E(f 2 \A n ) ■ ■■E(f d \A n ) - hE{h\An) ■ ■ ■ E{f d \A n )\ r 
+ \hE{f 2 \A n ) ■ ■ ■ E(f d \An) - f 1 f 2 E(f 3 \A n ) ■ ■ ■ E(f d \A n )\ r + ■■■ 

+ I/1/2 " " " fd-lE(f d \A n ) — /1/2 • • • fd-lfd\r 
<|£(/l|A) - h\ P \E{f 2 \An) ■ ■ ■ E(f d \An)\pr/(p-r) 

+ \E(f2\An) - f2\p\flE(f 3 \An) ' • • E(f d \An) \pr/(p-r) H 

+ \E(f 3 \An) - MrlflhEiUlAn) ■ ■■E(f d \A n )\ pr/(p - r) + 

+E(f d \A n )-f d \ p \E(f 1 \A n )---f d -l\pr/(p-r) 

d d 

<j2\ E Uk\A n ) - m p n \u P . 

k=l m=l,mj^k 

From the martingale convergence theorem for the space L p we conclude 
that every sequence (D di7l (fi ® • • • ® f d )) n >i converges in the norm of 
L r (/i) to the function fx (•)••• f d (-)- The analogous conclusion holds for 
finite linear combinations of elementary tensors. Since the norms of 
the operators D d ^ n are uniformly bounded, the proposition follows. □ 

Thus, the function D d f G L r (p) is a well-defined substitute for the 
naive restriction of / to the principal diagonal. For example, for n = 
(ni, . . . , n d ) the function D d V n f can be viewed as a substitute for the 
function 

f(T ni x,...,T nd x). 

3. Strong law of large numbers 

3.1. A multivariate ergodic theorem. If T is an ergodic transfor- 
mation of a probability space, a von Mises statistic may be considered 
as an estimate for the multiple integral of the kernel with respect to 
the invariant measure. Consistency is one of the desirable statistical 
properties of (a sequence of) estimates which immediately raises the 
question of an appropriate ergodic theorem. 

Proposition [31 the main result of this subsection, states such a theo- 
rem in a general setting. It looks similar to some Wiener-type ergodic 
theorems ([22], Theorem 8.6.9), the difference is that our result gives 
convergence of certain multiparameter sums in ([9]) almost surely with 
respect to some singular probability measure which is not invariant 
under the multiparameter action which we are considering. This be- 
came possible due to rather strong requirements imposed on the kernel. 
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Without such assumptions or some substitute for them we can not even 
ask whether the conclusion of the Proposition [3] is true or not: it may 
just make no sense for some more general kernels. On the other hand, 
our result is not a generalization of Birkhoff's individual ergodic the- 
orem because of the requirement that p > 1 in the case that d = 1. 
Moreover, the classical individual ergodic theorem is essentially used 
in the proof. However, this result is sufficient for the applications to 
the SLLN for von Mises statistics developed in the next subsection. 

We do not assume here symmetry of the kernel and consider sum- 
mation over rectangular coordinate domains (which is common in the 
multiparameter ergodic theorems, see [22] , Chapter 8) rather than over 
coordinate cubes involved in the definition of V-statistics. Moreover, 
in this subsection (unlike other parts of the paper) we consider sev- 
eral possibly different \i— preserving transformations Tm, . . . , Tm) of the 
space (X, J 7 , /jl). Extending the previously introduced notation, we set 



T (n u ...,n d )( Xu = ^n, ^ T ^x d ) and V^"'^ f =foT< 



This extension does not cause any additional difficulties. Such theo- 
rems may be useful when comparing several dynamical systems with a 
common invariant measure (starting at the same initial point). 
Transformations considered in this subsection in general are not er- 
godic, so we need some notations to treat the general case. Recall 
that A G J 7 is said to be T— invariant if T -1 A = A a.e. For every 
I G {l,...,d} let Tinvj, denote the a— field of all T} — invariant mea- 
surable sets in (X, J 7 , /j) and E inVj t be the corresponding conditional 
expectation considered as an operator in L pi (X, J 7 , /x). Set 



Proposition 3. Let d > 1 and suppose that p 1: . . . ,p d G (1, oo], r G 
[1, oo] satisfy 



Let T(i) , . . . , T(d) be measure preserving transformations of the prob- 
ability space (X, J 7 , ji) and f G L pi (fi) ® n ■ ■ • ® n L Pd (/x). Then, as 
rii — > oo, . . . , Ud — > oo, the random variables 




E, 







l<k d <n d -l 



converge with probability 1 and, if r < oo, in L r (fi) to 
(10) Dd{E inv ,i ®« ■ ■ ■ ® w 
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Remark 3. In order to see that the above limit equals ( llOl) . observe 
that for pi, . . . ,pd G (1, oo) it is not hard to prove a multiple statistical 
ergodic theorem asserting the strong convergence 



1 



ni---n d 



V V (kl '-' kd) ->• (E, 



l<fci<m-l 

l<k d <n d -l 



rti— >oo 
rid— > oo 



inv,l 



it -^inv,d) 



of the operators acting on the space L Pl (Xi, T\, /ii)<SV ■ -®^L Pd {X^ Td, Hd) 
Applying the operator Dd to the both sides of this relation, we obtain 
the convergence in the L r -norm for pi < oo, . . . ,p d < oo in Proposition 

El 

The next lemma will be used in the proof of Proposition El 



r and the transformations Tm , . . . , T ( 



(d) 



Lemma 2. Let d > 1, pi, . . . 

satisfy the conditions of Proposition OH Then there exists a constant 
C = C(p\, . . . ,pd) such that for every f G L Pl (n) ® n ■ ■ ■ ® n L Pd (p) we 
have that 



sup 

l<ni <oo 
l<n,j<oo 



Erai-1 
fcl=0 ' 



m • • • n d 



<c\f\ P1 ,..., 



Plv>Pd,T- 



Proof. For the proof we will need the well-known bound 
8.6.8) 



Theorem 



EE! v k F\ 



sup - 

l<n<oo 



n 



<C{p)\F\ 



where C(p) depends only on p G (1, oo]. This is the lemma for d = 1. 
Let now d > 2. According to the properties of the projective tensor 
norm ([15], Proposition 2.8), for every / G L Pl (p,) CSV • • ■ ® n L Pd ([j,) 
and e > there exists a bounded family of functions G L Pl (/j,) 
(1 < i < oo, 1 < I < d) such that 

f = z2 h' 1 ® 7 " ' ' ' ®*fi,d 



and 



i,l I pi I J i,d\p 



\fi,d\pd — \f\pu- 



,Pd>* 



e. 
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supi<m<oo 

l<n d <oo 



ni-1 n d —l 

[n 1 ---n d )- 1 '£---Y,D d V^ 



*E 

i 

=E 

i 

=E 

i 

sE 



supi<m<oo 

l<n,j<oo 

SUpi<m<oo 
l<n £ j<oo 

SU Pl<ni<oo 



fe 1= k d =0 

"i-l "d-1 



Jit ■■■n d 



ki=0 k d =0 
7)i-l nj,— 1 



fc 1= k d =0 



Eni-1 T/fci y- I 

m 

Em— 1 T/fci <• I 
fc 1= 



SU Pl<n d <oo 



End— 1 T/fc<i f I 
fci=0 



/'i 



" ' ' su Pl<n d <oo 



fe 1= K (<i)^i 



U4\ 



Pd 



I j,i Ipi 



su Pl<m< 

i i 

<C(|/| pi) ..., Pd)7r + e). 

In the above formulas V(i) , . . . , V^) are the dynamical operators asso- 
ciated with the transformations T^, . . . , T^)- D 

Proof of Proposition O In view of Lemma [2j the proof is straightfor- 
ward. First we prove the assertions of the theorem for elementary 
tensors. Let f = f\ ® ■ ■ ■ ® fd with f\ G 1 < I < d. Then the 

corresponding normalized V'-statistic can be written in the product 
form 

Eni-l t rki p sr^n d -l Trk d p 

ki=0 V {1)J 1 Z^fci=0 V (d)Jd 

m n d 

where by the individual ergodic theorem the Z-th term in the product 
converges to E inVj ifi with probability 1. Hence, the product tends with 
probability 1 to 

(Einv,ifi) ' ' ' (E inVi dfd) — DdE inv ^f. 

The same conclusion holds for finite sums of elementary tensors which 
are dense in the space L Pl (fi) ® n ■ ■ ■ ®> n L Pd (fi). Let now / G L Pl (/j,)<S> n 
• • • ®7r L Pd (fi). Fix an e > 0. There exists an element f e G L pi (/i) 



L Pd (fi) with |/ - /, 



elPl.—iPd 



tt < e such that the a.s. assertion 



of the proposition holds for f e . Recall that the operator D d (E, 



inv,l ^-yn 
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• • <g>„. E inVt d) has norm 1, hence with probability 1 
0<£ 



dcf 



lim 

ni— >-oo 



ni---n d 



n d — >oo 



o<fci<m-l 
0<fc d <n d -l 



< lim 

m— ^oo 

ri d —> oo 



+ lim 

rii— >-oo 
n d — >oo 



— £ 

• • • rid 

o<fci<m-i 

0<Ay<n<j-l 



n x • ■ • n d 



E 

0<fci<m-l 

0<fc d <n d -l 



DdV ( kl ,...,k d)fe _ Dd ( R 



inv.l 



J it Einv,d) fe 



+ 



D d {E inv>1 ®-n 
def 

— ?l.e + ^2,e + ^3,e- 



®tt E inVtd ){fe ~ f) 



We have £ 2 ,<! = 0, |^3 )£ | r < e, and, in view of Lemma [2], |£i )£ | r < Ce. 
This implies £ = which proves the convergence with probability 1 . To 
establish the L r -convergence, we observe that we have the convergence 
with probability 1 along with the domination by an L r -function given 
by Lemma [2j Hence, we can apply Theorem 3.3.8 in [22]; if r = 1, 
Lebesgue's dominated convergence theorem applies, too. □ 

3.2. Applications to the SLLN for von Mises statistics. We re- 
turn here to the assumption that the transformations Ti,...,T d are 
copies of the same transformation T. For simplicity we assume that T 
is ergodic. Symmetry of the kernel will not be assumed. 

Theorem 1. Let r = p/d for some integer d > 2 and a real number 
p > d. Let T be an ergodic measure preserving transformation of a 
probability space (X, J 7 , //). Assume also that f e L pn (fi d ). Then, as 
n — > oo, the sequence 

(11) i DdV^-^f 



l<fcl<n 
Kfcj<n 



converges with probability 1 and in L r (fi) to the limit 
(J d f)(xi, . . . ,x d )lJi{dx 1 ) ■ ••(i(x d ). 

i 

Here J d '■ L p ^{^ d ) — > L p {^ d ) is the operator introduced in LemmaUl 
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Proof. The theorem follows from Proposition [3j We only need to iden- 
tify the limits. Since the limit expressions given in Proposition Eland in 
the theorem are both continuous in the projective norm, it suffices to 
check that these expressions agree for elementary tensors fx <g> • • • <g> fa- 
it is straightforward to check that in the ergodic case both expressions 
reduce to 

Efx---Ef d , 

where E denotes the integral with respect to \i. □ 

Corollary 1. In the case p = d Theorem^ applies and gives the con- 
vergence with probability 1 and in Li(p). 

Remark 4. Examples show that it is possible to extend the class of 
kernels to which the conclusion in Corollary [1] applies to such kernels 
/ G L p (fi d ) which can be "sandwiched" between decreasing and increas- 
ing sequences of some L p 7r (/i d )-kernels whose common L p (// )— limit is 
/. This may be a sign that more appropriate functional spaces to treat 
the SLLN can be found. No similar examples for distributional results 
are known to the authors. 

4. Hoeffding's decomposition 

In this section we recall well-known properties of Hoeffding's decom- 
position for kernels in the spaces L p , omitting proofs (see [37] for proofs 
and other properties of Hoeffding's decomposition). It is not hard to 
see that the results and formulas related to this decomposition (both 
general and symmetric) apply also to the spaces L p ^(fi d ) and, in case 
Hi = ■ ■ ■ = fid = /i, to the symmetric subspace L s p y ™(n d ) C L P)7r (/i d ). 

4.1. Hoeffding's decomposition for general kernels. Let 

(Xi, J 7 !, /ii), . . . , (Xd, J~d-, Hd) be probability spaces. Though we are 
mostly interested in the particular case when all (X;, J 7 ;, [/,{), I = 1, ... ,d, 
are copies of the same space (X, J 7 , //), the latter is not assumed in this 
subsection. Let Sd (S™) be the set of all subsets (respectively, subsets 
of cardinality m) of {1, ... , d}. For every S C {1, . . . , d} we set 

(x 5 ,^,/i 5 )= (n^Ji^n^) 

^les les les ' 

Denoting by E^ the conditional expectation with respect to a o— field 
Q C J 7 and by 7T; the canonical map from X^'-M onto X« = X t 
(I — 1, ... ,d), we set for every S G Sd 

F S = \J TTf 1 ^), E s = and^ = E^~> d ™ . 
leS 
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In other terms, applying E means that one integrates out the I— th 
variable. 

We have the following decomposition of the identity operator I in 
Lpipl 1 ™**) (pe[l,oo]): 

1=1 m=osesxits I'es 

or 

m=0 SeSJ 1 

where 

Qs = Ue 1 H(i-e<'). 

i£s i'es 

In general, Hoeffding's decomposition assigns to every / e Lp(^ 1 '"'' < ^') 
the family (Rsf)ses d such that 

i) for every S E S d R s f E L p (/i s ); 

ii) for every S = {l u ...,l m }eS™ 

(R S f) O 7T S = Qsf, 

where n s : X d H> X 5 is defined by 7Tg(xi, . . . , x d ) = (x h ,. . . , Xi m ); 

iii) every Rgf is canonical in the sense that for every / G S 1 

E'i?5 = 0. 

It follows that every / e L p (//l 1 ''"' d l) can be represented in a unique 
way in the form 

d 

(12) / = E E ( R sf) ° 

m=0 SeS™ 

As we said before, Hoeffding's decomposition also holds for Lp^i^ 1 ''"' 1 ^) = 
Lp(i^i) ®tt • • ■ ®irLp(fid), and we shall use the above notation for the 
operators on these spaces as well. We also used before and will use 
in the rest of the paper the notation L Pj7T (^ d ) instead of Lp^ifJ^ 1 ''"''®) 
even in the case of possibly different measures /ii, . . . , \i& whenever this 
does not cause a misunderstanding. 

The degree of a kernel / with decomposition (fT2|) (or the similar de- 
composition (IT51) below) is, by definition, the smallest integer d! < d 
such that we have 

d' 

/ = EE ( R sf) ° 

m=0 5g5^ 
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For example, if S G S™ and (Rsf) ° 7^ 0, the degree of (Rsf) ° 
equals m. Further, by definition, the order of / with the decomposition 
({12]) equals the lowest degree of non-zero summands in (TT2T) (so that 
for (Rsf) ° tts 1 7^ the degree and the order agree). A kernel / in ([121) 
is called degenerate if its order is greater than one and non-degenerate 
if it equals one. 



4.2. Hoeffding's decomposition of symmetric kernels. Unlike in 

the previous subsection, we assume here that all spaces (Xi, T\, ni), 
I = l,...,d, are copies of the same probability space (X, J 7 , /i). In 
this subsection L p (fi d ) and L P)7r (/i d ) denote, respectively, the usual 
L p — spaces of the product of d identical probability spaces and the 
projective tensor product L p (/x)<8> ff • • ■ (g^-Z^/i)) with the norm | • \ Pt d,n- 

d times 

There is an isometric action of the symmetric group by permuta- 
tions of multipliers on every of these spaces, and the fixed points of 
such an action form a closed subspace whose notation will contain the 
superscript sym. The next property of Hoeffding's decomposition is 
specific for the symmetric case. 

iv) whenever the function / belongs to L sym (^ d ) ) the subspace of 
symmetric functions in L p (fi d ), the canonical function Rsf does 
not depend on the choice of S G S™ and is symmetric; thus, in 
this case there exist operators R m : L s p ym (jj d ) — > L s p ym (fi m ) such 
that for every S = {ii, . . . , i m } G S™ 

(Rmf) O 7T S = Qsf- 

Furthermore, every / G L sym (fi d ) can be represented in a unique way 
in the form 

d 

(is) / = E E (^/) ° 

m=0 sesx 

Example. We illustrate the difference between general and symmetric 
kernels for d = 2. For a general kernel / G L p (/i 2 ) we get 



f(x 1 ,x 2 ) = fa + + f{2}{x 2 ) + f{i,2}(x 1 ,x 2 ), 
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where 

h = / f(z 1 ,z 2 )ii(dz 1 )n(dz 2 ), 
Jx 2 

f{i}( x i) = / f(x 1 ,z 2 )fi(dz 2 ) - / , 



Z{2}(^2) = / f(z 1 ,X 2 )^(dz 1 ) - / , 
J X 

f{l,2}{xi,X 2 ) = f(x!,X 2 ) - / { l}(xi) - f {2 }{x 2 ) + fo. 

Notice, in order to illustrate the notion of canonical kernels, that we 
have for almost every x±,x 2 E X 



x 



f {1} (z)»(dz)=0, [ f {2} (z)v(dz)=0 
Jx 



and 



/ f{i,2}{z u x 2 )ii(dz l )= I / { i,2}(^i, z 2 )n{dz 2 ) = 0. 
Jx Jx 

For a kernel / G m (/i 2 ), the above relations reduce to 
f(xi,x 2 ) = fo + fi(xi) + fi(x 2 ) + f 2 (xi,x 2 ), 

where 

fo= j{zx,z 2 )ii{dz{)^{dz 2 ), 
Jx 2 



fi( x ) = I f(x,z)n(dz) - / I= J f(z,x)n(dz) - / ) , 



/ 2 (^i,^ 2 ) = f(x 1 ,x 2 ) - fi(xi) - h{x 2 ) + / - 

Here 

/ f 1 (z) f i(dz)=0, 
Jx 

f 2 G L s p ym (fi 2 ) and for almost every x G X we have 



/ 2 (z, x)jj(dz) I = y / 2 (a;, z)ji(dz) ) = 0. 
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5. Exact transformations and related operators 

In the remaining part of the paper we deal with distributional conver- 
gence of von Mises statistics for a measure preserving transformation. 
Our tool here is a kind of martingale approximation which for d — 1 
goes back to [27] , [29] , [39] (in the latter paper only Harris recurrent 
Markov chains were considered) and was developed for higher dimen- 
sional random arrays in [30] . 

The additional structure needed is a filtration compatible with the dy- 
namics defined by a measure preserving transformation. From now on 
we restrict ourselves to a class of measure preserving transformations 
of probability spaces, which are exact [12]. A discussion of our as- 
sumptions can be found in the last part of subsection 11.41 Let T be 
a measure preserving transformation of a probability space (X, J 7 , /i). 
The transformation T defines a decreasing filtration (T~ k J r )k>o- Exact- 
ness of T means that f\>o T J- = A/", where M is the trivial a— field 
of the space (X, J 7 , /i). As can easily be seen, every exact transforma- 
tion is ergodic. The standard assumption of the ergodic theory is that 
(X, J 7 , /i) is a Lebesgue space in the sense of Rokhlin. Under this as- 
sumption it can be shown that, except for the case of the one point 
measure space, the Lebesgue space with an exact transformation is an 
atomless measure space, hence, is isomorphic to the unit interval with 
the Lebesgue measure. As before, we denote by V* the adjoint of the 
operator V. As the operator V acts as an isometry in all L p spaces, 
preserves the constants and the positivity, the operator V* also acts 
in all these spaces as a contraction preserving constants and positivity. 
The operator V* is a particular case of a Markov transition operator. 

For every k > we have the relations V* k V k = J, and V k V* k = E k , 
where / is the identity operator and E k = E T ■ F , the corresponding 
conditional expectation. Let E denote the expectation operator. We 
can easily conclude (for example, from known facts about the con- 
vergence of reversed martingales) that exactness of T is equivalent 
to the the strong convergence V* n — > E in every space L p (n) with 

n— >oo 

1 < p < oo. In the sequel the strong convergence of the series 

(14) Y. v * k f 

and other similar conditions will be imposed on /. Set 

L° p (») = {feL p ,Ef = 0}. 

Note that for every 1 < p < oo the series (TT~C|) converges in the norm of 
L p {ji) for / from the dense subspace of Lp(fi). Equivalently, this series 
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converges if and only if / can be represented in the form f = (I — V*)g 
with some g G L p (fi). 

6. Growth rates for multiparameter sums 

It follows from Lemma CD that the space L^™(/J l m ) can be identi- 
fied with a (non-closed) dense subspace of L^ 171 ^ 171 ) using the injec- 
tive map J m . From now on we will omit the symbol J m . In partic- 
ular, we will write L^ 71 (//"") C L^"^/- 4 " 1 ) instead of J m (L^™ (/i m )) 
C L2 ym (fi m ). Consequently, it makes sense to speak of canonical ele- 
ments of L2 V ™(fi m ). 

6.1. Definition and properties of martingale difference spaces 
in dimension one. A noninvertible measure preserving transforma- 
tion T of a probability space (X, J 7 , /x) has a natural decreasing filtra- 
tion given by (T~ n J-*) n > . Setting E n = E T njr ,n > 0, for the related 
conditional expectations, we observe that E n E n+m = E n+m E n = E n+m 
for m, n > 0. Thus, we obtain a decreasing sequence of conditional ex- 
pectations, in particular, of norm one projections, in every space L p (/i). 
Since E n = V n V* n it follows that 

E n+k = V k E n V* k and E n+k V k = V k E n , k, n = 0, 1, . . . . 

For every p G [1, oo] and n > 0, E n — E n+1 is a projection in L p (fi) of 
norm < 2 satisfying the relation 

E n - E n+1 = V n (I - VV*)V* n = V n (I - Et)V* n . 

We set M p>n = (E n — E n+1 )L p (fi), n > 0. These are closed subspaces of 
L p (n) satisfying the relations 

M P:n+k = V k M p , n , V* k M Ptn+k = M Pin , k,n = 0,l, 

In L 2 (fi), (E n — E n+ \) n >Q is a sequence of orthogonal projections to 
mutually orthogonal subspaces (M 2>n ) ri >o spanning the space L^fJ,). 
Every h G M Pi0 defines a reversed martingale difference sequence (V n h) n > 
in L p (n). A multiparameter version of this setting is discussed in the 
next subsection as the main tool for bounding various error terms in 
the martingale approximation. 

Lemma 3. For every p G [2, oo) there exists a constant C(p) such that 
for every stationary sequence (£ n )nez of martingale differences in L p (fi) 
we have that 

n-l 

/c=i 
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Proof. Let p G [2, 00). Using Burkholder's inequality (Theorem 9 in 
) for the original sequence and then applying the triangle inequality 



for the space L p / 2 to the sequence (£n)nez > we obtain 



71-1 

k=l 



<C(p) 



^ 71-1 ^ 1/2 



k=l 



<c(p)\a 



2 1 1/2 

p/2 



0\p 



□ 



6.2. Bounds for multiparameter sums. For every m, < m < d, 
let S m (<S^J be the set of all subsets (respectively, of all subsets of 
cardinality s G {0, ...,m}) of the set N(m) = {l,...,m}. For every 
5* G iS m define a subsemigroup 



{(ni,...,n m ) G Z™ : n k = for all A: £ 5}. 

Lemma 4. Let m G {l,...,c?} and let ei,...,e m denote the stan- 
dard basis of Z™. Then, for every real p G [2, 00) and ever?/ integer 
s G {l,...,m} ; t/iere exists a constant C(p,s) > roi/i £/ie following 
property: For every S G 5^ and / G L P)7r (p, m ), satisfying 

(15) r e '/ = 0,leS, 

i/ie relation 



(16) 



p,m,7r V ie5 / 



0<k t <n t ~l,teS 



l/l 



p,m,7r- 



/ioWs /or every family (n t ) te s of natural numbers. Moreover, if p > m 
and r = p/m, we also obtain 

j2 D ™ yk f ^ °(p> s ) (n v 7 ^) \f\pw 

0<kt<n t -l,teS 

for every (n t ) teS . 

Proof. Let s and S be as in the statement of the lemma. Since \f\ p , m ^ 
increases in p and the norm of the map D m : L Pi7r (/i m ) — > L r (fi) is one, 
it suffices to prove (fl6l) . 

Let m denote the neutral element of Z™. Set 

M'o^ = {fe L p ^ m ) : V^f = for every I G S}. 

Observe that the subspace M^ 0m 7r C L Pi7r (/i m ) itself can be represented 
as the projective tensor product of s copies of the subspace M p C 
Lp(fjL) and m—s copies of the space L p (p). Notice that relations (fToT) are 



30 



MANFRED DENKER AND MIKHAIL GORDIN 



equivalent to the following description of the corresponding subspace 
in terms of projections: 

(/ _ y ei V* ei )f = f for every / G 3. 

The subspace Mf ^ can be also described as the range of the projec- 
tion 

Y[(I -V ei V* ei ). 

s&S 

We need the following fact which follows from Proposition 2.4 in |43j . 
Assume that we have Banach spaces A\ and their closed subspaces 
B\ C Ai , I = l,...,m. Without further assumptions we only have a 
canonical linear map 

i ■ B x ®^ ■ ■ ■ ® 7T B m -> Ai®^ ■ ■ ■ ® 7T A m 

of norm one. However, if every B\ is a complemented subspace in the 
corresponding A\ (that is there exists a bounded projection tpi : Ai — > 
Bi) then this map is a topological linear isomorphism onto its range 
(the latter is closed in Ai<S> n • ■ ■ ® n A m ). Moreover, if every <pi is of 
norm one than this map is an isometry. 

Thus, if bounded projections (y^);=i,..., m exist, we can consider B\® n • • ■ ® n B, 
as a closed subspace of A\®^ • • ■ ® n A m . Moreover, the map 

is a bounded projection of Ai® n ■ ■ ■ ® n A m onto its subspace Bi<g> n ■ ■ ■ ® n B m . 
The latter subspace can be described by 

Bi® w ■ ■ ■ ®^B m = {/ G Ai®* ■ ■ ■ ® 7T A m : ((pi® w ■ ■ ■ ®*tp m )f = /} 

or, equivalently, by 

Bi® n ■ ■ ■ ® w B m 

= {f e A^ ■ ■ ■ ® 7r A m : ((I-^i)®* •••«„/)/ = 0; 

(l® n {I - <p 2 )® ■ ■ ■ ®l)f = 0; . . . ; (I®J®« ■ ■ ■ ® 7T (I - (p m ))f = 0}. 

Moreover, the projective tensor norm on the space Bi® n ■ ■ ■ ® n B m and 
the norm inherited from Ai® n ■ ■ ■ ® w A m via the embedding are equiv- 
alent. 

We will apply this assertion to the case when Ai = L p (n) for every / G 
{1, . . . , m}, Bi = M P)0 , <pi = I- VV* for I G S, and B x = L p {fi), <p t = I 
for I £ S. Since VV* is a conditional expectation, it is clear that ipi is 
bounded for every I (in fact its norm does not exceed 2 1 "( 2 / p )). With 
this notation we have that M p0mlx and Bi® n ■ ■ ■ ® n B m are isomorphic 
as topological vector spaces. Observe that we have here the same vector 
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space which is equipped with two possibly different norms: the norm 
inherited from L pn (fi m ) and the projective tensor product norm, re- 
spectively. According to one of the properties of the projective tensor 
norm ([HI], Proposition 2.8), for every / G M^ 0m>7r and e > there 
exists a bounded family of functions fa G B\ (1 < i < oo, 1 < / < m) 
such that 

/ = Ai ® " " ' ® A™ 

i 

and 

^ ] \ fi,l\p ' ' ' \fi,m\p — C (P) s )|/|p,m,7r + e - 
i 

The constant C'(p, s) appears here because we put into the right hand 
side the inherited norm \f\ Pt m,u of / rather than its norm in B\® v ■ ■ ■ ® n B m . 
Set for I = 1, . . . , m and every % = J2o<k<m-i V k fi,i $ I £ S, and 
F%,i — fi,i if I ^ S. Then, applying Lemma [31 it follows that 



kez + ' 

0<ki<ni-l,leS 



p,m,7r 



E 



p,m,ir 



E V k {fiA® Am) 
Q<fci<m-i,Je5 

71; — 1 

e n i^=En|E^/„,i ni/. 

i Ze{l,...,m} i ZeS fc=0 2<^S 

^wfn^E n \hi\pi---\hm\p 

^les ' i ie{i,...,m} 
< C"(p) (J] v 7 ^) (C(P, s)liW + e 



Thus inequality (ITS]) follows with C(p, s) = C s (p)C'(p, s). □ 

Remark 5. Every / satisfying the assumptions of the above lemma is 
S— canonical in the following sense: since every operator V* ei preserves 
the integrals with respect to the I— th variable, it follows from (j!5p that, 
under the assumptions of LemmaH], integrating / over the /— th variable 
returns whenever I G S. This implies the assertion. 



In the following lemma some sufficient condition is given under which 
the martingale-coboundary decomposition is valid. 
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Lemma 5. Let p G [1, oo] and f G L Pi7r (/i m ) be a canonical kernel such 
that the series in the right hand side of 

(17) g= V* {kl '-' km) f (:= lirn^ V* {kl '-' km) A 

0<fci<oo ^ ■■■ 0<fci<rai-l ' 

~ ... n m — >co - — 

0<k m <oo 0<km<n m — l 

converges in L P)7r (// m ). Then f can be represented in the form 

(is) / = AS f> 

S £<5> m 

where for every S G S m 

(19) A s f = ([[(I - V t V;) H(V t - I))h s 

t<£S tes 

and the function h s G L Pi7r (/i m ) is defined by the relation 

(20) h s = (Hv*)g. 

The summands in f|T8l) are uniquely determined. 

Proof. See the proof of Proposition 1 in [30]. □ 



Proposition 4. Let < s < m and f be a kernel satisfying the as- 
sumptions of Lemma [5| for some p G [2, oo). Let A s f be defined by 
formulas (Tl9l and (|20|) . Then there exists a constant C(p, s) > such 
that for every S G and every n%, ■ ■ ■ , n m 



o<fci<m-i S^s ' 

0<fc m < 



(21) 



where g is defined in ( U7j) . Moreover, for p >m 
(22) I 



•a<m-l M£,S ' 



p,m,iT 



0<k 
0<k m <n m — 1 
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Proof. Setting S = {1, . . . , m}\S, we have 

y(ki,...,k m ) j^S j 

0<fci<m-l 
0<fe m <n m — 1 

(23) 

0<kt<nt-l,teS 0<lu<n u -l,u£S 

= e ^n^-^n^-^ 5 

0<fc t <n t -l,teS 

Since for / ^ 5 

y«i _ ^ r y;) Y[(v s n ° - I) h s = 

and 



p,m,TT ' 



r<£S seS 



the proposition follows with Cp jjrijS = 2 m C(p, s) from Lemma |4] and 
formula (EH. □ 



Proposition 5. Letp > 2 and / G L Pi7r (/i m ) fre a canonical kernel such 
that the series in the right hand side of 

g= y^y*(fei,-,fcm) f ( := i im y*(Ai,...,& m )f ] 

0</ci<oo ^— ' \ fei^oo ^— ' / 

v ... 0<fei<rii-l 7 

0<fc m <oo fc m — >oo 

0<fe m <n m — 1 

converges in L P;7r (//"). T/ien /or every m, . . . , n m the following inequal- 
ity holds 

(24) | E V(kl '"" km) f\ P ,m,, ^ C P,™^M ■■■n m \g\ p , m ,«, 

0<jfci<ni-l 
0<k m <n m — 1 

where C m , p is a constant depending only on m and p. If, in addition, 
p G [m, oo) then, with r = p/m, we also have that 

^ D m V^ ' ^/| r — Cp^my/Tll ' ' ' W>m \d \p,m,ir- 

0<&i<ni-l 
0<fc m <n m — 1 
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Proof. Again, since the norm of the operator D m : L PjVr (/i m ) — > L r (fi) 
is one, we only need to prove ([21]). As % > 1, ... , n m > 1, we have for 
every S G S m 

rji 

Using this relation along with ( JT8]) and ( |2T|) we obtain (I24j) with 



< 1. 



m / \ 

r _W m V 

s=0 ^ S ' 



□ 



7. Central Limit Theorem in non-degenerate case 

We denote by N(m, a 2 ) the normal distribution with mean value m G 
M and variance a 2 > (N(m, 0) denotes the Dirac measure at m 6 t). 
We first prove a central limit theorem including the convergence of the 
second moments as well. Also note that in Theorem [2] the assumptions 
in the case m = 1 are stated separately from those for m > 2; a unified 
condition is certainly possible. 

Theorem 2. Let f G L^ m (fi d ) be a real valued kernel, and 

d 

be symmetric Hoeff ding's decomposition of f. Assume that 

(1) the series 

oo 

(25) J2 V * kR if 

k=0 

converges in L 2 {[i) ) 

(2) for every m = 2, . . . , d R m f G L^^{ji m ), and the series 

(26) V* {kl >-' km) R m f (:= Jim^ V* (kl '-' km) R m f J 
0<fci<oo ^ ••• 0<fci<m-l ' 

— ... n m ->oo - — 

0<fc m <oo 0<k m <n m — 1 

converges in -L 2m ,7r(/^ m )- 

T/ien 

K (d) / = ^ E ^ (fel '-' fcd) (/-W) 

0<fci<n-l 



0<fc m <n—l 
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converges in distribution to N(0,d 2 a 2 (f)), where 



- 2 (/) 



> 0. 



k=0 2 k=l 

The convergence of the second moments 

E(Vl d) f? -> dV(/) 

n— >oo 

holds as well. 

Remark 6. According to the standard terminology the kernel / is called 
non- degenerate if R\f is not identically zero, and it is called degenerate 
otherwise. In the case of i.i.d. variables such form of non- degeneracy 
is equivalent to the non-degeneracy of the limit Gaussian distribution 
by using the normalization by the constants n d ~ l l 2 . However, in the 
general stationary dependent case such a static non- degeneracy can be 
combined with the degeneracy of the limit distribution. This phenom- 
enon can be called the dynamic degeneracy. 

Proof. Decompose / — Rof in the following way: 

d d 



m=0 S€S" 



m=0 



where 



f m = ^2 ( R ™f) ° n s, 



m 



,d. 



To prove the theorem it suffices to establish that 

1) vjfi fi converges in distribution to N(0,d 2 a 2 (f)), 

2) |K W /i|I -X d 2 a 2 (f), 

n— >oo 

3) |K (d) EL 2 /m|2 "> 0. 



In view of the equality 

Dd (y( ku ..,k d ) J- (R m f) on s )= D m V^-' k ^R m f 
sesx s={h,...,i m }es™ 
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we obtain 

V^f m = V^(J2( R mf)^s) = 



1 £ D d {V^Y,^f)^s) 



n d-l/2 

0<fci<n-l 

0<fc rf <n-l 



(27) =^itj E E 



n d-l/2 

0<fei<n-l 5={ii,...,i m }65™ 

0<fc rf <n-l 



^m— 1/2 

0<fci<n-l 

0<fc m <n— 1 

for every m = 1, . . . , d. It follows from (1271) . Proposition |5] with p = 2m 
and the assumptions of the theorem that the function f m satisfies the 
inequality 

'n Jm\2 _ I I i/m 2m,rrt,7r; 

where g m denotes the sum of the series ( j25l) (m = 1) or ( l26l) (m > 2). 
For m >2 the latter bound implies the convergence to in (/x) which 
proves 3). 

Now we need to investigate the sums involving f\. We obtain from ( 1271) 
that 

n— 1 

(28) ^A^^r^/, 

where has the representation 

Rif = 9i~ v* gi 

with denoting the sum of the series (1251) . This representation can 
be transformed into 

(29) R x f = (/ - + (V - I)V*g u 

where the first summand is an ergodic stationary sequence of reversed 
square integrable martingale differences (V k (I — VV*)gi)k>o (hence 
satisfying the Billingsley-Ibragimov CLT), while the second summand 
only contributes uniformly /^-bounded functions to each of the sums 
Efc=o V k R 1 f. Since the sums are normalized dividing by y/n, conver- 
gence to the normal distribution in 1) is established. 
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The convergence of the second moments can be concluded as follows. 
In the situation of the Billingsley-Ibragimov CLT we have 



EV V k (l-VV*)gi 



\(I-VV*)g 1 \i=\g 1 \i-\V*g 1 \i = (7 i (f) 



2 

It follows from (EHJ and ([29]) that 

2d\ gi 



\V^fi-da(f)\ 2 < 



n 



2 



which implies 2) and, together with 3), the convergence of second mo- 
ments. □ 

Under somewhat weaker assumptions we have the following central 
limit theorem with the convergence of the first absolute moment. 

Theorem 3. Let f G Ll yrn (fi d ) be a real valued kernel, and 

d 

be symmetric Hoeffding's decomposition of f. Assume that 
(1) the series 



(30) 



k=0 



converges in Li (/■*), and the sums ^2 7 } :=0 V k Rif satisfy the rela- 
tion 

n 

(31) \*T / V k R 1 f\ 1 = 0(y/E) asn-^oo; 

k=0 

(2) for every m = 2, . . . , d R m f G L^™(/i m ), and the series 

(32) V V* ikl '-' km) R m f ( : = lim V V *(k 1 ,...,k m ) A 
^ — ' V ki— >oo ^ — ' J 

fci^oo v ... 0<fci<m-l 7 



k m — >00 

fc m ->oo 0<k m <n m — 1 



converges in L m 7r ( / u m ). 
Then there exists o~ 2 (f) > 0, such that 



V^ d) f = -kj2 E D d V^-^(f-R f) 



n d-l/2 

0<k 1 <n-l,...,0<k d <n-l 
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converges in distribution to N(0, d 2 a 2 (f)). The convergence of the first 
absolute moments 



(33) 



holds as well. 



Proof. The proof is parallel to that of Theorem [2], so we will concentrate 
on the essential changes in the proof. Consider Hoeffding's decompo- 
sition of / — R f 



d d 



f- Ro f = j2 E ° ^ = E 

with 



m=0 SeSJ 1 m=0 



f™ = z2 ( Rm f"> ° ts, m = 1, • • • , d. 

In order to prove the theorem it suffices to establish that 

1) for some a(f) > 0, Vd ft converges in distribution to N(0, d 2 a 2 (f)), 

2) \vPfi\i -> dJ*a(J), 

n— >oo V 

3) |K (d) El= 2 /-li o. 



Similar to the proof of Theorem [2J the functions f m , 1 < m < d, can 
be shown to satisfy the inequality 

\V^f m \x<C m ( d \n^-^\g, 



m 



in | m'm,Tri 



where g m denotes the sum of the series (1301) (m = 1) or (1321) (m > 2). 
For m > 2 the latter bound implies the convergence in Li(fi) to zero 
proving 3). 
Again, we obtain 

n—l 



k=0 

where Rif has the representation 

Rif = 9i~ v* 9l 

with gi G Li(n) denoting the sum of series (13"U|) . As in the proof of 
Theorem 2, Rif can be represented in the form 



Rif = (I - VV*) 9l + (V* - I)Vg 



i- 
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where the first summand defines an ergodic stationary sequence of re- 
versed martingale difference (V k (I — VV*)gi)k>o , while the second one 
only contributes a uniformly Li-bounded amount to each of the sums 

n-l 
k=0 

However, now we only have (J — VV*)gi G Li(/i), while we need 
(J — VV*)gi e L 2 (/i) to apply the Billingsley-Ibragimov CLT. The 
latter can be concluded, as suggested in [28], from fl3Tj) using another 
Burkholder's inequality (Theorem 8 in [12]) and the ergodic theorem 
(see [H] for details). This proves the convergence in distribution. The 
convergence of the first moments can be concluded similar to the cor- 
responding part in the proof of Theorem [2J □ 



8. A LIMIT THEOREM FOR CANONICAL KERNELS OF DEGREE 2 

Apart from non-degenerate kernels of the previous section, a differ- 
ent type of von Mises statistic arises from canonical symmetric (also 
known as totally degenerate) kernels of degree d > 2 (one can iden- 
tify the order and the degree when dealing with canonical functions). 
There are two approaches to the description of the limit behavior of a 
V^-statistic defined by a symmetric canonical function. 
The first approach works for arbitrary degree; the limit distribution is 
that of some multiple Wiener integral, also representable as a multiple 
integral with respect to the Brownian bridge ([25]). 
The second approach treats the case d = 2 and is based on the diago- 
nalization of the symmetric kernel. In the present section we take the 
latter approach, combining it with the martingale approximation. We 
assume that / = / 2 in terms of Hoeffding's decomposition for symmet- 
ric kernels (see example in subsection 14.21) . The following Proposition 
[6] is a corollary of Lemma [5] for m — 2. Let 9 denote the involution 
in (X 2 , J-" 2 , /i 2 ) interchanging the multipliers in the Cartesian product. 
We consider the spaces L 2)7r (/x 2 ) and L 2 y ™(/i 2 ) as embedded in L 2 (/x 2 ). 

Proposition 6. Let /(= / 2 ) £ L^™(ii 2 ) be a canonical kernel of degree 
2. If the limit 

(34) g= lim V f 

n\,ri3— >oo ' ^ 

0<ii<m-l 
0<i 2 <n 2 -l 
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exists in L 2)7r (/i 2 ), then f admits a unique representation of the /orm@ 

(35) 
/ 

= g ® + (yd.Q) _ + (i/(o.D _ 7 ^ {2} + (v ,(i,o) _ j)(y(o,i) _ j)^{i,2} j 

where g\g^ e Lf™ V) , 9 {1} , 9 {2} e U^ 2 ), <7« o 6 = g™, gW o 
6 = gW, andE(g®\T-^F 2 ) = 0, E{g^\T L ^ l )j^) = 0, E(g^\T^ 1 ^ 2 ) 
0,E(gW\T-^F 2 ) = 0. 

Proof. Take the decomposition of / given by Lemma [5] for m = 2. Then 
apply 9 to the decomposition and use the uniqueness of the decompo- 
sition. □ 

The function g® is a kernel of a symmetric trace class integral oper- 
ator in L 2 (/i). Hence, it admits an eigenfunction decomposition 

oo 

(36) g 9 (xi,x 2 ) = y^ J X m ip m (xi)ip m (x 2 ), 

m=l 

where (y? m ) m >i is a normalized orthogonal sequence in L 2 (fi) and (A m ) m >i 
is a real sequence for which Ylm=i \^m\ < 00 > so that every A m ^ ap- 
pears in the series only finitely many times and the same holds for 
fl36l) . We will assume that X m ^ for every m, so that (y? m ) m >i is not 
necessarily a basis in L 2 (fJ,). 

Theorem 4. Lei f be a real-valued canonical kernel satisfying the as- 
sumptions of Proposition® Then, asn-> oo, the sequence of random 
variables 



- V D 2 V {h ' i2) f 

n / — ' 

0<u<n-l,0<i 2 <n-l 

converges in distribution to 

oo 



m=l 



where (?v)m=i is a sequence of independent standard Gaussian vari- 
ables. Moreover, 

1 oo 

\71 z — • J n— >oo z — • 

0<ii<n-l,Q<j 2 <n-l m=l 



5 Upper indices here follow Lemma [5] 
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Proof. Using the decomposition ( |35i) . observe that 
- D 2 V^\f-g®) 



n 

0<h<n-l,0<i 2 <n-l 



ii) 2 (yM-/)(yM-j) s {^ 

n 

+~( H D 2 V^°\V^ - I)g& + ^ J D 2 U(°' i2 )(^°)-/)( 7 



{2}> 



0<«i<n-l 0<i 2 <n-l 

,2\ 



The first summand tends to zero in L 2 {iJ, ) since 

| ( y(n,0) _ J)(y (0,n) _ j^l^ < ^{l,^ < 

In the last summand we deal with two sums of reversed martingale 
differences which are, consequently, orthogonal within every of these 
sums. Hence, it follows that the norm of the last summand is bounded 
above by 

^(k {1} | 2 + l^ {2} )| 2 )<4%^ -+ 



'71 y"n n->oo 

which reduces the proof to the special case of the kernel g®. 
Let us show next that for every m > 1 such that \ m 7^ 

E{v m \T~ l F) = 0. 
For /i 2 — almost all (xi,X2) we have 

00 

= E(g 9 \T-^J*)(x 1 ,x 2 ) = Y,^m{xi)E{^ m \T- l 7){x 2 ) 

m=l 

which for every m implies, via multiplying by ip m (xi) and integrating 
over x\ with respect to /i, that \ m E({p m \T~ 1 J r )(x2) = 0. This implies 

E^IT- 1 ?) = 

/x— almost surely for every m > 1 with A m 7^ 0. 

Define now a random variable = J2m=i ^rnVm an d a truncated 
kernel g^ by setting 

N 
m=l 

Observe that for every N the assertion of the theorem on the conver- 
gence in distribution and the convergence of the first moments holds for 
for g^ and when / is replaced by g^ . Indeed, the Billingsley- 

Ibragimov theorem applies to reversed R^— valued martingale differ- 
ences (this is straightforward with the Cramer- Wold device). So, the 
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random vectors 

n— 1 n— 1 

converge in distribution to (771, . . . ,t)n) as n — > 00. Hence, the random 
variables 

0<ii<n-l,0<i 2 <n-l m=l ^ V fc=o 

converge in distribution to J2m=i ^mVm as n ~~ ^ 00 • The convergence 
of the first moments follows here from the convergence of the second 
moments in the CLT for martingale differences. 
To complete the proof we observe that 



l£-6v|l= V XmVm < Y] \ X m\ ^ 
* — ' 1 — ' iV-x 



m=N+l m =N+l 





00 



and 



- y D 2 V^g 9 -- T D 2 V^gi N) 

n n 

0<ii<n-l,0<i2<n-l 0<u<n-l,0<i 2 <n-l 

y \ m ( 4= V O ® w (4= J" ° ^ 



< E i a ™i 

m=N+l 

uniformly in ra. □ 

Example. Let X = {2 e C : \z\ = 1}, fi be the probability Haar 
measure on X, Tz = z 2 ,z £ X. Clearly, 

(Vf)(x) = f(x 2 ),(V*f)(x) = l/2 y /(«). 

{u:« 2 =a:} 

If /1 G £ 2 (/i) and f x fi(x) fj,(dx) = then the series 

fc>0 

converges in L 2 (/i) under very mild conditions. For example, the con- 
dition 

fc>0 
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is sufficient. Here w^(f\, 5) is the modulus of continuity of fi in L 2 {jj). 
Let now / G L 2 (fi 2 ) be of the form 

f(x 1 ,x 2 ) = g(x!x 2 l ) 

with 

g{x) = ^g k x k G L 2 (/i). 

Assume that f = f'2 (that is / is canonical), real- valued and symmetric. 
This means that go = 0, g k are real and satisfy g^ k — 9k for all k G Z. 
Assume now, moreover, that / 2 G L^i^fi 2 ). This is equivalent in our 
setup to the relation 

The condition of existence of the limit 

0<ii<n-l,0<i 2 <n-l 

in L 2iVr (yU 2 ) is satisfied if the series 

00 

fc=0 

is norm convergent in the space of absolutely convergent trigonometric 
series, that is 

nSZ fc>0 

The latter condition holds, for example, if for some C > and 5 > 

J , < g 
m ~ |m|(log |m|) 1+<5 

for every m6Z,m/0. 
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